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1. INTRODUCTION 

English is a universal language that is widely used in the science [1] as well as in the technology 
fields. English-to-Arabic neural machine translation (NMT) is particularly important and is mainly based on 
the transfer classification. Comparatively little work has been done on machine translation (MT) systems 
involving Arabic language as the source or target language [2]. MT method based on neural networks has 
several advantages over the other approaches, which is one of the most widely explored areas in MT system 
[3]. The system of MT are considered as very sensitive to domains that they are trained on because each 
domain has its specific style, terminology and sentence structure [4]. Ambiguity of words is often a problem 
in machine translation systems [5]. For example, the English word “frequency” must be translated differently 
if it occurs in a technical or economic context. The main idea of our work is based on the fact that the neural 
models can benefit from domain information to select the most appropriate sentence terms and structures, 
while using information from all areas to improve basic translation quality. 

Hadla et al. [6] reported that MT technology in the field of neural network throughput in machine 
translation systems is an important area of research to optimize [7] the efficiency and modesty of the sulfur 
industry through side barriers. We've extended this idea to domain management. Our goal is to enable models 
with different training data to produce translations within the domain [8]. This means extending general 
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NMT models to specific areas and their specific concepts and styles [9] without compromising the quality of 
translation of more general information. 

Previous work has shown that the NMT model can investigate attention distributions that intuitively 
explain the reasonable correlation between source and target languages [10]. Literature in this area indicates that 
a little works have been conducted in the Arabic language as a target language. Statistical machine translation 
(SMT) has been the main translation paradigm for decades [11]. Even before the advent of direct machine 
translation of neurons, neural networks were successfully used as a component of SMT systems. Perhaps one of 
the most significant experiments involved the use of a common language model to study sentence presentation 
[12], which led to a dramatic improvements in sentence-based translation and extended sentence systems. 

Many new techniques have been proposed to improve MT for example manage the results of rare 
words, various attention mechanisms [13] and minimize sentence loss. Some tecent works also have 
especially dealt with domain adaptation for NMT by providing meta-information to the neural network. The 
present work is in line with this kind of approach, and translation accuracy of this system is gratifying, recent 
work has focused on adapting NMT domains in particular by providing metadata to neural networks [14]. So, 
the topic of this paper is a part of this approach. The power of the neural network in issues related to the 
decoder; the topics are varied and consist of product categories labeled with people [15]. Include thematic 
modeling of encoder and decoder components. The number of standard items is automatically extracted from 
the linear discriminant analysis (LDA) training model; each word in the sentence gets its own thematic 
vector. In our work, we also provide metadata and information about the domain [16]. 


2. METHOD 

NMT is a technique based on neural networks and conditional probability of a sentence translated 
from the source language into the target sentences [17] which is widely used in the area of deep learning for 
MT. Sequence-by-sequence architecture is used in machine translation models to find the relationship 
between two different language pairs [18]. The system architecture (NMT) is shown in Figure 1. The 
algorithm used for performing English to Arabic translation can be explained with the help of the diagram 
shown [19]. Figure 2 illustrates the architecture of of the proposed DIA translator system. 
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Figure 1. Architecture of the neural translation machine system 


2.1. Encoder model 

The source language analysis (English language) and data entry methods are appropriately prepared 
for machine translation. The input is a raw material for the whole system; a text file contains a well- 
structured collection of sentences written in English language. The effectiveness of source language analysis 
can be increased in the three steps applied to the morphological analysis, syntax analysis (parser tree) and 
semantic analysis [20]. 

The original words are first drawn with word vectors and then inserted into a double neural network 
(RNN) that reads the input letter S = {wl, w2, w3, ..., Wn}; receives one element of the input string at each 
step, processes, collects, and disseminates information about that element. The coding part contains information 
that connects string chains with vector spaces to perform neural network calculations. Since words also have a 
meaningful sequence, a repetitive neural network is suitable for this task, the problem with this method is that it 
does not completely solve grammatical complexity, especially when translating the word nth into ocular 
language, RNN considered only (1..n)-word in the original sentence, but the grammatical meaning of the word 
also depends on the order of the words before and after the sentence: Using a two-dimensional model allows us 
to enter the meaning of past and future words to create an exact vector for the encoder output: but then it 
becomes a challenge, which word should we focus on? [21]. prepared a document showing that we can learn 
words in the language of the eye to focus by storing the previous result in long short-term memory (LSTM) 
units, then sorting according to each appropriate and selecting words with the highest scores. 
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Figure 2. Architecture of the DIA translator system 


2.2. Attention decoder model 

Encoder-decoder models with attention have been proposed and then become a de-facto standard in 
the neural machine translation [22]. This part explains target language generation (Arabic language) and how 
output texts are appropriately translate on machine translation system. The effectiveness of generate 
equivalent target language can be increased in the two steps applied to the sentence reorder and semantic 
[23]. Assume B as a target sentence, in the decoding process, the following word is assumed using predefined 
words and units of (1) Target objects A = {A1, A2, A3,...., Time}. Using a chain rule, the distribution of 
expressions can be subtracted from left to right, since the focusing system is part of the neural network. 
Determine the components of the eye that are most important for each step of the decoder. At this point, the 
encoder does not need to squeeze the entire eye into a vector, it provides an indication of all flashing signals. 


P(y =y|x = x) = [J pOtlyo,..., yt — 1, x1, ..., xs) (1) 


NMT models which conform the (1) is referred to as L2R autoregressive NMT [24]-[26], for the 
prediction at time-step t is taken as a input at time-step t+1. The model uses the attention of a series of 
coding, and the weights determine the attention of relationships that combine information from different 
places. This framework is very appropriate for our current study because we emphasize the ability of NMT to 
collect contextual dependencies from a broader context beyond sentence boundaries. 

Focus is chosen to target a subset of the hidden encoder states per target word. The model first 
generates a p (t) alignment position for each target word at time t, while learning the alignment positions in 
attention. In other words, it enables efficient GPU-based training and decoding with a mini series and 
determining whether the translation order is different from the original sentence (original word 1 can be 
words 4in a translated sentence). 

The following algorithm includes three main steps used in the machine translation (MT) system: 

1* step: encoder network. 
- Input (source text). 
- | Semantic source text. 
2" step: attention-decoder network. 
- Target text generation. 
- Optimize the target text. 
34 step: evaluation and rank. 
- Evaluation DIA translator with Google translator. 
- Rank DIA translator and Google translator from best to worst evaluation. 
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Each step for the encoder or decoder is inputs and generating output for that time step. In this resides the 
limitation of classic sequence to sequence models; the encoder is “forced” to send only a single vector, 
regardless of the length of input source and the model over fits with all sequences. 


3. RESULTS AND DISCUSSION 

This study has been conducted on bases of the data set constructed by (sulfur production company 
catalogs), the (1,200 sentences) of the previous dataset, that are divided by each reference translation of the 
sentences of all English-Arabic in the dataset of (4) main sentence functions, (text, terms, phrase, general text) 
with each of all English-Arabic sentence reference translations in the data set. The results of average precision 
for each phrase in the corpus of DIA translator and Google translator and are illustrated in Table 1 and Figure 3. 


Table 1. Human evaluation average precision for each type 


MT/Criterial Terms by domain Phrase by domain Text by domain Text without domain Average precision 
DIA MT system 0.85 0.80 0.73 0.61 0.793 
Google translator 0.33 0.44 0.39 0.75 0.387 
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Figure 3. Average precision of evaluation systems 


Research evaluation is most vital in considering success and failure of research work done so, this study 
uses sulfur industry domain by evaluation its system by specialized English-Arabic translation center at university 
of Tikrit vs. Google translator. The comparison between DIA translator with Google translator indicate that: 

— Google translator doesn't support chemical symbols, while the (DIA translator) system supports 
chemical symbols in detail. 

— In case of the (terms), it was stated that DIA translator scheme is much better than the Google 
translator. The reason for this is that the DIA translator system database contains translation file of 
English-Arabic terms. 

— In case of the (phrase by sulfur industry domain), it can be seen that the Google translate system is 
capable of displaying the MT DIA navigation system in most cases. 

— Inthe sulfur web industry, it can be seen that the Google transfer system is in most cases inferior to the 
MT DIA navigation system; Results testing from Google's translation system shows that the most 
sought-after analysis of these items is genuine and irreparable. 

— It should also be noted that while Google translator cannot translate complex sentences with sulfur energy 
with accuracy, the 100 DIA translator interpreter can interpret some of these sentences on the phone. 

— In general texts (texts that do not use sulfur energy), we note that the precision is the same in some 
simple sentences. Google's translation system is a much wider application than the Arabic machine 
translation system for multiple articles. Composite sentence structure. 

Finally, it can be seen that the Google translate system is capable of displaying the MT DIA 
navigation system in most applications, as illustrated in Table 1. From the obtained results, we can conclude 
that the DIA translator is produces optimized outputs better than Google translator for sulfur industry domain 
(0.387 for Google and 0.793 for DIA translator) in the tests conducted. 


4. CONCLUSION 

In this work, the domain sulfur industry into a NMT for one of the most difficult language pairs 
(English-Arabic) has been used. From the results above obtained, it can conclude that the domain with byte pair 
encoding and pre-trained word embedding can performs better translation than the English-Arabic languages 
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general translation techniques. The results obtained also indicate that the DIA MT system accuracy is 
approximately 79.3% compared with the submission accuracy for the Google translator which is approximately 
38.67% in case of using domain sulfur industry. Finally, from all the results obtained, we can conclude that the 
DIA translator is fairly good accuracy and able to outperform many baseline translation systems. 


FUTURE WORK 
Since domain classification is a document level task, it would be interesting to extend the current 
study to document level translation. 
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